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Abstract 


Reading is an indispensable skill for learners who desire success throughout their academic lives, and 
vocabulary knowledge is a sine qua non companion of reading comprehension. Despite being inextricably 
related entities, very little has been written about the necessary vocabulary coverage to understand an 
expository text and its equivalent in terms of vocabulary size in Turkish EFL context. Therefore, with this 
study, we focused on the relationship between the vocabulary coverage and reading comprehension of a 
group of foreign language learners. For this study, 178 university students completed a vocabulary checklist 
based on the vocabulary items of two different expository texts, and their reading comprehension levels 
were measured through two piloted reading comprehension tests for each text. The descriptive statistics, 
Pearsons correlation value and regression analysis were employed to analyze the data. The results revealed 
that the text-based vocabulary knowledge moderately correlated with reading comprehension, and there 
was a relatively linear relationship between them. It was also concluded that the 98% vocabulary coverage 
is needed for foreign language learners to comprehend academic texts, and this coverage, in fact, refers to 
approximately the most frequent 8000 word-families based on the related studies. 
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Recently, there has been a burgeoning interest in the relationship between reading 
comprehension and vocabulary knowledge, and this emanates from the fact that 
academic achievement is closely related to reading perfonnance (Adamson, 1993; 
Collier, 1989). As might be expected, the best way to leam new vocabulary items is 
considered to read, and knowing extensive vocabulary is a prerequisite to understand 
a text (Eskey, 2005). Therefore, it is necessary to scrutinize breadth of a person’s 
vocabulary as a predictor of reading comprehension on a regular basis to reach a 
consensus about the amount of vocabulary needed by an L2 learner for a reasonable 
comprehension of expository texts. However, although there have been several 
studies evaluating the relationship between reading comprehension and vocabulary 
knowledge in general, these studies do not include Turkish learners of English as 
participants whose native language is not etymologically related to English. Scholars, 
on the other hand, acknowledge that English as a foreign language (EFL) learners feel 
the burden of reading in an L2 twice as much as their counterparts, L1 readers, do, and 
success is difficult to come by without being a skilled reader (Carrell & Grabe, 2002). 
For that reason, the present article aims to investigate a perennial concern, that is, the 
relationship between the vocabulary coverage and reading comprehension of a group 
of Turkish EFL learners because first reading is an indispensable skill for academic 
achievement and second reading comprehension is directly linked to learners’ 
vocabulary knowledge. After a brief summary about reading comprehension and 
vocabulary knowledge as a reciprocal process, the article aims to discuss what percent 
of the vocabulary items should be known in an expository text to comprehend it, and 
the corresponding vocabulary size will be discussed by analyzing the sizes suggested 
in the studies of Nation (2006), and Laufer and Ravenhorst-Kalovski (2010). 

A Reciprocal Process: Reading Comprehension and Vocabulary Size 

Reading is delineated as “the process of receiving and interpreting information 
encoded in language form via the medium of print” (Urquhart & Weir, 1998, p. 22). 
As stated by Linan-Thompson and Vaughn (2007), and Grabe and Stoller (2002), 
reading comprehension is the main purpose for reading, and this purpose underlies 
and supports most of the other purposes for reading. However, since reading involves 
cognitive processes, reading comprehension is an invisible concept that can only 
be inferred (Bernhardt, 2011). Moreover, reading comprehension presents some 
challenges for learners as many students consider reading a boring and difficult 
task. When the complexity of reading is considered together with its purposes and 
properties, it becomes clear that reading is complex for both teaching and learning. 

In addition to the complexity of reading, it is unarguably clear that the relationship 
between vocabulary size and reading comprehension is reciprocal. While some scholars 
focus on the effects of vocabulary size on reading comprehension in their studies, some 
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others study the effects of reading comprehension on vocabulary size (Eskey, 2005; 
Hu & Nation, 2000; Nation, 2001; Nation & Angell, 2006). According to Rumelhart 
(1977) and Stanovich (1980), in evidence-based reading models, bottom-up processes 
such as word recognition and lexical access go hand in hand with top-down processes 
such as integrating background knowledge and processing strategies. Readers need 
automaticity in both word recognition and lexical access (Walter, 2003). From a lexical 
perspective, Anderson (2009) and DeKeyser (2007) summarize this long learning 
process as a path from understanding a word’s meaning to learning a major meaning 
of a word, and then learning many aspects of a word’s meaning and use. Therefore, the 
faster a reader recognizes a word, which is linked to learners’ vocabulary knowledge 
and automaticity, the better reading comprehension will take place. 

Based on these two complex but inextricably related concepts, the studies conducted 
by researchers can be helpful to understand the relationship between the variables 
that might lead to a consensus about the necessary vocabulary size to comprehend an 
expository text. As stated earlier, some studies focus on the effects of vocabulary size 
on reading comprehension while others study the effects of reading comprehension 
on vocabulary growth. However, vocabulary size, in the present study, is seen as 
the predictor of reading comprehension, and the effects of vocabulary coverage on 
reading comprehension are scrutinized within an EFL context. Although it is possible 
to observe many findings about vocabulary as a predictor of early reading achievement 
in first language (LI) settings (Bowey, 1995; Caravolas, Hulme, & Snowling, 2001; 
Stanovich, 1986, 2000; Thorndike, 1973), there are not many studies conducted in 
second language (L2) settings. 

fn one of the earliest studies, Laufer (1989) aimed to measure the relationship 
between the number of English words understood by a reader in an academic text 
written in English and the quality of comprehension of the text with native Hebrew 
and Arabic speakers. The 101 first year students who were taking a course of English 
for academic purposes were assigned to answer comprehension questions and to 
underline the words they could not understand in the text. As the result of the study, 
the group which scored 95% and above in lexical coverage test scored better in the 
reading comprehension test than the groups which scored 90-94% and 89% and below, 
fn other words, Laufer supports the threshold hypothesis in reading comprehension, 
and suggests that 95% and above lexical coverage of the text is necessary. 

In another study, for example, Nation (2006) and Laufer and Ravenhorst- 
Kalovski (2010) investigated which vocabulary level provides sufficient reading 
comprehension. Both studies focused on the hypothesis that 98% vocabulary 
coverage would be needed for necessary comprehension; however, Nation used a 
computer program to find a threshold level while Laufer and Ravenhorst-Kalovski 
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conducted the study with 735 students who studied in academic college in Israel, 
and took an English for academic purposes course. In sum. Nation’s and Laufer and 
Ravenhorst-Kalovski’s studies gave nearly same vocabulary size which is on average 
7000-8000 plus proper nouns to reach 98% coverage level in academic texts. 

On a similar basis, Hu and Nation (2000) aimed to see what percentage of 
coverage in a text was needed for reading for pleasure. The correct answers to 
the comprehension questions were compared to a vocabulary levels test scores of 
66 adult English as a Second Language (ESL) learners who were attending a pre¬ 
university English course in an English speaking country. The mother tongues of 
the participants were Thai, Chinese, Ni-Vanuatu, Indonesian, Japanese, Korean, 
Vietnamese and Gennan. Different densities of unknown words resulted in 
differences in comprehension, which was observed through the answers given to the 
questions of a fictitious text. Also, the hypothesis that the comprehension declined 
as the number of unknown words increased was confirmed. Although Hu and Nation 
(2000) conducted the study to measure adequate comprehension in fictitious texts, 
they found that the same percentage, 98% coverage, was needed for most learners to 
gain adequate comprehension. 

Up to this point, the previous studies investigated the relationship between reading 
comprehension and vocabulary size with vocabulary size tests; however, Schmitt, 
Jiang, and Grabe (2011) investigated the relationship on the basis of the exact number 
of the words known in a text. In their study, 661 students from eight different countries 
answered an extended vocabulary checklist test and a reading comprehension test. 
Although a moderate correlation was found between the variables, they could not 
identify a threshold on which comprehension increases greatly. For the vocabulary 
coverage, Schmitt et al. (2011) concluded that 98%-99% coverage is required for 
understanding an expository text but having a deeper lexical knowledge does not 
mean that it would enhance the chances of comprehension to a greater extent. 

One of the important limitations of the studies before the one by Schmitt et al. 
(2011) was that they analyzed the relationship between reading comprehension 
and vocabulary size with vocabulary tests; however, the results were inconclusive 
as the tests asked a few items because of limited time. With an extended checklist, 
Schmitt et al. removed this limitation to identify vocabulary size of the learners 
more appropriately. However, they did not analyze the learners’ performances based 
on their first languages, and this handicap might have caused some fluctuations in 
the results as some learners could have been advantageous in terms of their first 
language. Considering the fact that the reading test scores of Turkish EFL learners are 
significantly under the average in TOEFL and IELTS (Educational Testing Service, 
2012; IELTS, 2011), and given that there is an etymological distance between the 
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two languages (i.e., Turkish is an Altaic language with an agglutinative morphology 
(Durrant, 2013) and English is an Indo-European language) it is of importance to 
scrutinize the relationship between vocabulary size and reading comprehension with 
Turkish students of English, as well. Since the participants shared the same Li in this 
study, various Lis in this case did not have a fluctuating effect on the interpretation of 
the results. In sum, the following research questions were posed in the present study: 

i. Is there a threshold level in terms of vocabulary coverage between adequate and 
inadequate comprehension of an academic text? 

ii. Will different percentages of vocabulary coverage result in differences in reading 
comprehension? In particular, will comprehension increase as the number of 
vocabulary coverage increases? 

iii. What percentage of coverage is necessary to comprehend an expository text at an 
adequate level? 


Method 

Setting and Participants of the Study 

The respondents who took part in this study were enrolled in an English Language 
Teaching (ELT) program at a state university in Turkey. To be eligible for studying 
at undergraduate ELT programs in Turkey, students are required an English language 
test. This test is a component of the university entrance examination for the students 
who would like to study in language programs at universities in Turkey. In the current 
study, the undergraduate respondents, who passed the placement test or studied one- 
year preparatory program, were selected as the participant group to determine the 
optimal vocabulary size that is necessary for comprehending expository texts. In 
sum, these participants were a group of ELT program students who study in this 
program for four years to be able to work at institutions of different levels (i.e., from 
primary level to tertiary level) as language teachers. 

For the study, a vocabulary checklist and a reading comprehension test were 
distributed to a total of 184 students; however, 8 of the respondents were excluded 
since four of them marked more than three non-words, and two of them denied to 
answer more than half of the reading comprehension test battery. In this sense, the 
study group was comprised of 178 respondents, which was composed of 76 freshman, 
60 sophomore, 40 junior, and 2 senior students. As for the gender distribution of the 
respondents, 71.3% (n = 127) were females and 28.7% (n = 51) were males. The 
respondents between the ages of 18 and 23 comprised approximately 90% of the study 
group. Even if the simple random sampling process was followed, the population was 
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included in the same proportions in terms of gender as in stratified random sampling. 
In this sense, these numbers increased the likelihood of representativeness (Fraenkel, 
Wallen, & Hyun, 2012). 

Data Collection Instruments 

Vocabulary checklist. In this study, we asked the participants to answer a 
vocabulary checklist after they filled in the informed consent form and demographic 
information part. The vocabulary checklist was used to measure vocabulary coverage 
of the students without violating the authenticity of the texts. Since checklist tests 
serve to measure a large number of words, 60.7% of the content words were measured 
in this study. In the vocabulary research, selecting the target lexical items is one 
of the basic and critical steps, and frequency is one of the most important aspects 
for selecting these lexical items (Schmitt, 2010). In this sense, the readings were 
submitted to BNC-20 v 3.2 British National Corpus lists version of VocabProfilers 
program (www.lextutor.ca) to detennine the frequency levels of the vocabulary in 
the readings. It is possible to determine the first 20.000 frequent words in the BNC 
with the help of VocabProfilers program. Therefore, the readings used in the study 
were analyzed with the program, and a large proportion of the readings (78.60%) 
were formed of K1 words which represented the first 1000 frequent words list in 
BNC. The percentage of K2, K3 and K4 words was 14.5%. Off-list words, which 
may include proper nouns, unusual words, specialist vocabulary, acronyms and 
abbreviations, contributed 2.39% to the total. When we looked at the type-token 
ratio, which indicates the number of different words in the text (types) divided by the 
number of words on which they are based (tokens), the average was found as 0.44. In 
other words, the lexical variety was measured as 44%. 

As a limitation of such checklists, the answers of the learners might not be reliable. 
To propose a solution for this problem, an automatic check was built in the test in 
order to ensure that the learners’ self-assessments were reliable (Meara, 1992), and 
22 plausible non-words from Meara and his friends’ list were integrated into the sets 
of vocabulary items. In sum, a vocabulary checklist which was comprised of 168 
words was used to measure the number of the words the students knew. This 168- 
word checklist gave an advantage to check 290 words practically in a short time. 

Reading texts and comprehension tests. Following this checklist, a reading 
comprehension test was given to students in order to measure their reading 
comprehension level. Three criteria were considered in the text selection process: 
student factors (Frantzen, 2003; Levine & Reves, 1998), text factor (Hu & Nation, 
2000), and context factors (Diakidoy & Anderson, 1991; Frantzen, 2003; Haastrup, 
1991). As the study group was comprised of the university students, two expository 
texts which could serve as an archetype for more thorough descriptions of a variety 
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of scientific genres (Lewin, Fine, & Young, 2001) were chosen from the Science 
and Technology part of The Economist. The Research & Development in America- 
Bad Medicine was printed on March 2 nd , 2013, and the Exercise and Elderly- Circuit 
Training appeared on September 22 nd , 2005. The lengths of the texts were 545 and 578 
words respectively. The difficulty analysis was carried out based on the Flesch-Kincaid 
Grade Level. The Flesch-Kincaid Grade Levels were 8.7 and 9.3 for these two texts 
which have been roughly “the nonn in much past reading research, in order to mimic 
the more extended reading of the real world” (Schmitt et al., 2011, p. 30). Considering 
the contextual factors and the profile of the participants, the first reading test and its test 
battery of Schmitt et al. (2011) were excluded, and a new test battery was developed for 
The Research & Development in America- Bad Medicine by the researchers. 

To eschew the limitations of the study and increase the content validity, the first 
test battery of the Exercise and Elderly- Circuit Training was adapted from the study 
of Schmitt et al. (2011) with their permission and taken as a basis since they, as the 
well-known experts in the field of L2 reading, used multiple methods and techniques 
to measure the comprehension level of students (Alderson, 2000; Alderson, Clapham, 
& Wall, 1995) and piloted the test battery twice in different contexts. Considering 
the fact that their study aimed to analyze the relationship between vocabulary size 
and reading comprehension, they included the questions to assess the vocabulary 
knowledge of the participants but avoided to ask vocabulary items directly. The 
second test battery was prepared in line with the aims of the first test battery, and it 
included similar questions in tune with the first test battery. For each text, 10 multiple- 
choice (MC) questions and 10 graphic organizer (GO) questions were prepared and 
incorporated. As put forward by Ffaladyna (2004), MC questions are efficient and 
provide a useful summary of student learning of knowledge and cognitive skills, 
especially for large-scale testing programs. As the second part, GOs were used 
as a frequently used information transfer task. The respondents were expected to 
transfer the information from the text to the graphic organizer. Alderson et al. (1995) 
emphasize GOs as information transfer tasks which resemble real-life activities. The 
drawback for the GOs is the objectivity of marking. To overcome this drawback, 
two scorers marked the GOs of the first 30 respondents in line with the answer sheet. 
The marks that two scorers yielded were consistent; however, an analytical scoring 
instrument was prepared to increase inter-rater and intra-rater reliability of the results 
(Brown, 2003). The rest of the papers were scored by the researcher. 

As the novelty of the study, the comprehension questions were written in Turkish, 
the first language (LI) of the respondents. In the literature, in Turkey, a study using LI 
as the medium of comprehension questions could not be found. Many authors such 
as Figueroa and Flemandez (2000) stress the potential harm embedded in the sole 
use of L2 while assessing comprehension. Therefore, native language assessment 
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forms are seen as the most effective type by some researchers (Lara & August, 1996). 
According to Nation (2009), the questions in the first language could be worth using if 
learners feel comfortable with them. In line with the views of Nation and some other 
researchers, we gave a set of Turkish comprehension questions to the respondents 
with an attempt to fill in the gap in the field. The details of the piloting study were 
given in the following section. 

Piloting Study 

Language testers are frequently reminded that the qualities of reliability and 
validity are essentially in conflict, and it is not easy to design test tasks that are 
authentic and at the same time reliable (Bachman & Palmer, 1996). Nevertheless, 
it does not mean that one of these can be ignored in test designing process. For the 
test battery, the question set of “Circuit Training” was adapted from the study of 
Schmitt et al. (2011) with their permission, and 10 MC questions and 10 GOs were 
written for the other text entitled “Bad Medicine”. . The content validity of the test 
batteries was reviewed by the second researcher of the current study as the advisor 
of the dissertation and an expert in the field. The other professors’ opinions were 
also consulted. Besides, the texts and the questions were read by a group of English 
instructors to ascertain the readability and intelligibility to avoid misunderstandings. 
After all these processes, the pilot study was conducted at an ELT program of Gazi 
University, Turkey. The piloting population was 136, and the numbers of the female 
and male respondents were 116 and 19 respectively. The most frequent method 
employed for internal consistency of the test items is Kuder-Richardson approach, 
particularly formulas KR20 and KR21 (Fraenkel et al., 2012). The KR20 reliability 
estimates for the reading tests were as follows: 0.72 for the entire reading test, 0.70 
for “Bad Medicine” and 0.60 for “Circuit Training”. The value of 0.70 is a desirable 
level for the reliability, and the reliability level can be considered as moderate. To 
have a detailed look, the complete test battery was given to two language experts 
in Turkish language who did not observe any flaws linguistically. Consequently, the 
same version of the reading test was decided to be used in the study. 

Data Collection and Analysis 

In addition to descriptive statistics, Pearson’s correlation value and regression 
analysis were employed to analyze the data. Descriptive statistics are a good way 
to get a snapshot of the distribution of the data. To analyze the relationship between 
the vocabulary coverage and reading comprehension, Pearson’s product-moment 
correlation coefficient was used as an example of the bivariate correlation. After 
the correlation analysis, we did a simple regression analysis to predict an outcome 
variable from one predictor variable (Field, 2009). Flereby, the predictor variable 
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was the vocabulary coverage of the respondents, and the outcome variable was the 
achievement of the respondents in the reading comprehension tests. 

Results 


Descriptive Statistics 

As given in the method part of the study, there were 168 words in the vocabulary 
checklist. However, the first 35 words from the K1 list represented 158 vocabulary 
items. Therefore, the number of the known words was multiplied by 158 and divided 
by 35 to find the approximate number of the known words in the total of 158 words 
in K1 list. Thus, the number of the words measured in the checklist was 290, and all 
statistical procedures were carried out based on this number of words. Based on the 
290 words, the mean value for the whole checklist was computed as 238.3 (82.17%) 
with a standard deviation of 18.83. The values ranged from 179 to 279, and the 
variance was found as 354.63 for the checklist. The mean for the K1 list was about 
145, and 11 students (6.2%) knew all the 158 words in K1 list. The number of the 
students who knew 142 (~90%) or more words in K1 list was 135 (75.9%). The mean 
value of the K2 list (m=49.90) was also high, and 39 students (21.9%) knew all words 
in K2 list. The number of the students who missed just one word was 38 (21.35%). 
The number of the students who knew 48 (90%) or more words was 145 (81.46%). 

fn the first 5000 list, the students knew at least one word. However, there were some 
students (74 students for K6 list, 10 students for the K7 list and 22 students for the K8 
list) who did not have an idea with the words in K6, K7 and K8 lists. Furthermore, 
we observed that students had difficulty in K10, K12, K14 and K16 levels, and, 
respectively, 166, 155, 159 and 163 students did not know even one word in these 
lists. According to the mean values of the first 8000 words, the students knew about 
235 (86.08%) out of 273 words in the checklist. Nation (2006) concludes that the 
English Language learners (ELLs) should know the first 8000 words to reach a level 
of 98% coverage for reading newspapers. Regarding the assumption of Nation (2006), 
the vocabulary coverage and comprehension level of the students could be analyzed 
in terms of correlation of the first 8000 words, too. fn line with the results of current 
study, Kirmizi (2014) also revealed that the Turkish EFL learners at undergraduate 
English language and literature programs had a high level of vocabulary in K2, K3 
and academic vocabulary list but a moderate level of vocabulary in K5 and a low 
level of vocabulary in K10 lists. 

The mean value for the entire reading test was 22.36 (55.9%) which can be 
interpreted as a little higher than 20, the half of the questions. The number of the 
right answers ranged from 7 (17.5%) to 35 (87.5%). According to these results, none 
of the participating students answered all the questions in the test battery right. The 
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number of the right answers accounted for 87.5% of the entire test for the best student 
who answered 35 questions right. Considering the right answers of the different test 
types, the mean value of the right answered multiple-choice questions was 10.76, 
and the mean value of the right answered graphic organizer questions was 11.60. The 
standard deviations of MC and GO questions were 2.63 and 4.14 respectively. For the 
entire test, the standard deviation was 5.79. 


Interplay between Vocabulary Coverage and Reading Comprehension 

Based on the first research question of the study, we sought a threshold but could 
not find one according to the results in Table 1 and Table 2. At the level of 88% 
coverage, an increase was seen with the mean score of 26.50. After this percentage, 
the vocabulary coverage and text comprehension started to increase gradually. On the 
other hand, this increase mitigated at the levels of 90%, 92% and 96% coverage. Due 
to the flaws at these levels, there was not an obvious point at which comprehension 
would accelerate dramatically. The reason for this decrease might be the limited 
number of respondents who knew 96% and 92% of the content words in the text. 
Therefore, the first hypothesis of the study was eliminated since there were some 
flaws at some vocabulary coverage levels. The results of this study corroborates 
with the study of Schmitt et al. (2011) as they could not find a threshold at which 
comprehension increases sharply, either. 


Table 1 

The Vocabulary Coverage & Reading Comprehension -1- 



96% a 

94% 

93% 

92% 

91% 

90% 

89% 

88% 

Student Numbers 

1 

3 

3 

1 

3 

6 

11 

4 

Mean b 

22.00 

29.00 

28.00 

24.00 

27.67 

24.50 

26.82 

26.50 

Median 

22.00 

29.00 

30.00 

24.00 

25.00 

24.00 

28.00 

25.00 

SD 

- 

2.00 

6.25 

- 

4.62 

3.51 

5.74 

6.19 

Min. 

22.00 

27.00 

21.00 

24.00 

25.00 

20.00 

15.00 

21.00 

Max. 

22.00 

31.00 

33.00 

24.00 

33.00 

29.00 

34.00 

35.00 


Note. a Vocabulary coverage, b The mean scores of the 40-question comprehension test. 


Table 2 

The Vocabulary Coverage & Reading Comprehension -2- 



87%" 

86% 

85% 

84% 

83% 

82% 

81% 

80% 

Student Numbers 

12 

17 

11 

11 

18 

12 

5 

6 

Mean b 

24.33 

23.94 

20.55 

24.09 

21.67 

22.08 

21.40 

20.50 

Median 

25.50 

22.00 

20.00 

25.00 

20.00 

23.00 

24.00 

20.00 

SD 

5.19 

6.32 

4.93 

5.68 

4.43 

6.32 

6.43 

5.58 

Min. 

12.00 

11.00 

9.00 

15.00 

15.00 

9.00 

14.00 

14.00 

Max. 

30.00 

34.00 

27.00 

32.00 

29.00 

31.00 

27.00 

30.00 


Note. a Vocabulary coverage, b The mean scores of the 40-question comprehension test. 
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The mean scores of the comprehension test gave an idea about what percentage of 
vocabulary is needed to understand a similar text. This level depends on the degree 
of required comprehension. This required degree of comprehension ranges from 55 
to 70 in different studies (Laufer, 1989; Laufer & Sim, 1985; Schmitt et al., 2011) 
based on the required pass marks of the courses. If this level is taken as 70, 93% and 
94% vocabulary coverages make texts comprehendible. However, it should not be 
ignored that the vocabulary coverage over 96% percent could not be represented in 
the study since there was no respondent over 96% vocabulary coverage. Therefore, 
the regression models made up for the lost data in this range, and gave the predictive 
results regarding the percentages over 96% in the following sections. 

To shed light on the second research question of this study which is concerned 
with whether different percentages of vocabulary coverage result in differences in 
reading comprehension, Pearson’s correlation coefficient was employed and found 
as .41 (n = 178, p < .01). According to Cohen (1988), the values between .30 and 
.49 can be regarded as moderate correlation. The correlation coefficient indicated 
that the vocabulary coverage accounted for 17% of the variation in the reading 
comprehension level. In other words, this means that 83% of the variation could not 
be explained by the vocabulary coverage alone. This result exactly corroborates with 
the results in the study of Schmitt et al. (2011) (r = .41). 


Table 3 





Pearson Product-Moment Correlations between The Variables 




Vocabulary Size 


Comprehension Test 


Vocabulary Size 

Pearson Correlation 

1.00 

Pearson Correlation 

.41** 

Sig. (2-tailed) 


Sig. (2-tailed) 

.00 

Comprehension Test 

Pearson Correlation 

.41 ** 

Pearson Correlation 

1.00 

Sig. (2-tailed) 

.00 

Sig. (2-tailed) 


N 

178 


178 



** Correlation is significant at the 0.01 level (2-tailed). 


Furthermore, we sought a correlation between the vocabulary coverage for the 
first 8000 words and the reading comprehension level of the students. The reason 
for looking at for the first 8000 words in these expository texts was prompted by 
Nation’s (2006) claim that it is the adequate quantity to read a newspaper. Hence, 
the source of the texts used in this study was also a well-known newspaper, the 
Economist. The Pearson’s correlation coefficient value was r = .44, which means that 
there was a linear positive correlation between the vocabulary coverage for the first 
8000 words and the reading comprehension level. The Pearson’s correlation values 
of both analyses were also similar (r = .41 and r, = .44). Corroborating with the 
assumption of Nation (2006), vocabulary size accounted for 20 percent of variance 
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in this study, as well. To interpret this value, the knowledge of the first 8000 words 
helped to explain about 20 percent of the variance in students’ scores on the reading 
comprehension test. The 2-tailed significance was computed as .00 in this correlation. 

In the literature, there is an agreement that vocabulary might not be the best but 
a good predictor of reading (Laufer, 1992; Laufer & Ravenhorst-Kalovski, 2010; 
Nation, 2001). Schmitt (2011) found a moderate correlation (r = .41, n = 661 ,P< 
.001) between the vocabulary coverage and reading comprehension, and this value is 
exactly the same as the one obtained in the analysis of the current study (r = .41, n = 
178, p < .001). In sum, the aforementioned studies point out that there are moderate 
and high correlations between vocabulary knowledge and reading comprehension no 
matter what the test types are. These results motivate studies on this relationship and 
lead researchers to delve into variables from different viewpoints. 

Optimal Vocabulary Coverage 

After checking all the assumptions such as normality of errors, histogram of 
standardized residuals and normal probability plot of the residuals, we carried out a 
regression analysis to answer the third research question. According to the regression 
analysis, the value of B t is. 125 and this value represented the gradient of the regression 
line. However, it is helpful to think of this value as representing the change in the 
reading comprehension level associated with a unit change in the vocabulary coverage. 
In other words, the model predicts that reading comprehension score increases about 
.125 for the one-word increase in the vocabulary coverage. For instance, the model 
predicts that a 10-word increase in the vocabulary size will enable to answer 1.25 
questions more (0.125 x 10 = 1.25). Considering the value of significance level, .00, 
it can be concluded that the vocabulary coverage made a significant contribution (p < 
.001) to predicting the reading comprehension level of the students. 


Table 4 

The Model Parameters and Significance of These Values 
Coefficients 3 

Unstandardized Coefficients Standardized Coefficients t Sig. 

Model B - 

Std. Error Beta 

(Constant) -7.47 5.06 -1.48 .14 

Vocabulary coverage ,125 -02 .41 5.91 .00 

a. Dependent Variable: the reading comprehension level 

So far, it has been discovered that the model in the regression analysis is a 
useful one, and this model significantly improves the ability to predict the reading 
comprehension levels. Therefore, the model is defined by replacing the b-values in 
the regression equation (Table 5): 


1182 









Giingor, Yayli / The Interplay between Text-based Vocabulary Size and Reading Comprehension... 


Table 5 

The Model Predicting The Equation between The Vocabulary Coverage and Reading Comprehension 
Outcome. = (model) + error 

The comprehension level = -7.47 + (0.13 x the vocabulary size) 


For the detailed analysis, there is a need to set a reasonable comprehension as a 
score of understanding. For instance, Schmitt et al. (2011) define 60% comprehension 
as adequate, and this percentage accounts for 24 out of 40 questions. To be able to 
answer 24 questions, readers were expected to know 242 (83.45%) out of 290 words 
in the texts used in this study. On the other hand, Laufer and Sim (1985), and Laufer 
(1989) detennine 65-70% and 55% respectively as the minimum comprehension 
scores, and these percentages, for this study, correspond to 26, 28 and 22 right 
answers respectively. If the values of 26, 28 and 22 are calculated with the regression 
equation above, the students should know 268, 284 and 236 words out of 290 content 
words respectively. 268 words account for 92.41%, 284 words account for 97.93%, 
and 236 words account for 81.38% of the content words in the texts. 

Based on the 70% comprehension level, Flu and Nation (2000) claim that 98% 
coverage is needed for adequate comprehension, and approximately the same 
percentages (98%-99% coverage) are mentioned in Schmitt et al.’s (2011) study to 
comprehend a text. Furthermore, Nation (2006) suggests that 98% comprehension 
is necessary to comprehend newspapers. The results of the present study (97.93%) 
corroborate with these three studies in terms of vocabulary coverage, and it means 
that the model predicting the equation between the vocabulary coverage and reading 
comprehension can be applied to predict the vocabulary size necessary to understand 
the text. By using the model, it can be concluded that Turkish EFL learners at the 
tertiary level need to know 98% of the content words to be able to understand at least 
70% of an expository text. 


Discussion 

Research on L2 reading has indicated that vocabulary knowledge is correlated with 
L2 reading comprehension (Droop & Verhoeven, 2003; Eskey, 2005; Flu & Nation, 
2000; Pike, 1979; Qian, 2002; Schmitt, 2000; Schoonen, Flulstijn, & Bossers, 1998; 
Stahl, 2003). Flowever, the current study attempts to expand the scope of the research 
in terms of direct comparison of vocabulary coverage and reading comprehension. 
Therefore, the results are expected to offer a comprehensive description of the 
vocabulary coverage and reading comprehension of tertiary level Turkish EFL learners. 

According to the results of the study regarding the relationship between variables, 
a threshold at which comprehension increases or decreases dramatically at a certain 
percentage of the vocabulary coverage does not appear. Instead, the correlation value 
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indicates a positive moderate relationship; high numbers of the known words are 
associated with high scores on the comprehension test. Therefore, it can be said that 
there is a fairly straightforward linear relationship between the vocabulary coverage 
and reading comprehension of the participants. The reason for the moderate correlation 
might be that reading comprehension involves much more than vocabulary knowledge 
(Nation, 2000). To reach a deep level of comprehension, readers are required to 
have some skills such as inferencing, making coherent connections between ideas, 
scrutinizing the ideas with a critical stance, understanding the rationale of authors 
(Graesser, 2007), activating their background knowledge (Rumelhart, 1980) and 
paying attention to the social context in which texts are produced (Lantolf & Thome, 
2007). Furthermore, readers should actively engage with a text or task to adopt a 
standard of coherence (Nation & Angell, 2006; Perfetti, Landi, & Oakhill, 2005; 
Schmitt et al., 2011). In other words, they should monitor their comprehension and 
make inferences from texts as good readers do. 

Considering the results related to adequate level of comprehension (which is 
considered to be 70% of the text in this study), Turkish EFL readers at tertiary level 
need to know at least 98% of the content words in a text. Even if they know 98%, 
they cannot comprehend the text at 100% comprehension. These two statements 
are in line with the findings of Hu and Nation (2000) and Schmitt et al. (2011). 
In this vein, it can be suggested that vocabulary knowledge is a prerequisite for 
comprehension. However, as Laufer (1989) pointed out, below 95% coverage does 
not mean that a person cannot understand a text since some other factors are involved 
in the comprehension process. Even with a limited lexical knowledge, some readers 
might enhance their comprehension by benefiting from grammatical clues, and some 
readers make use of their background knowledge and their familiarity with the text 
type to facilitate their comprehension. On the other hand, comprehending expository 
texts might become a challenge for even skilled readers. The reasons behind these 
difficulties might be lack of knowledge in key concepts and terms, arrangement of the 
text, and of prior knowledge (Graesser, 2007). 

Implications and Conclusions 

As for the implications of the results to the curriculum developers, EFL instructors 
and university students, 98% coverage for academic texts and newspapers refers to 
the knowledge of approximately the most frequent 8000 words based on the studies of 
Nation (2006) and Laufer and Ravenhorst-Kalovski (2010). Therefore, Turkish EFL 
learners at tertiary level need to know about 8000 words to be able to read academic 
texts without having any problems in terms of vocabulary. In Turkey, book authors and 
curriculum designers should carefully examine students’ needs of target vocabulary size, 
and they need to design vocabulary instruction accordingly. Also it can be concluded 


1184 



Giingor, Yayli / The Interplay between Text-based Vocabulary Size and Reading Comprehension... 


that scrupulously designed and aptly delivered vocabulary instruction should always 
be a part of language classes from the beginning of English instruction at primary 
education to the tertiary level. For such an instruction, corpus-informed vocabulary 
instruction, explicit or implicit, can be benefited as an alternative and promising 
methodology (Unaldi, Bayrakci, Akpinar, & Dolas, 2013). Corpus-based research might 
also be a starting point for teachers to determine the naturally occurring language in the 
classroom and focus on the unknown vocabulary or wrong usages (Vaughan, 2010) as 
this was emphasized in the corpus linguistics definition of Kennedy (1998, p. 1): “one 
source of evidence for improving descriptions of the structure and use of languages, and 
for various applications including the processing of natural language by machine and 
understanding how to leam or teach a language”. Thus, teachers can improve their own 
awareness in both pre-service and in-service years and understand the usage of lexis 
in the classroom (O’keeffe, McCarthy, & Carter, 2007). Also, they can teach students 
how to benefit from corpus-approaches in the classroom. Accordingly, the vocabulary 
knowledge of students can be increased, and, in turn, enhanced vocabulary knowledge 
might lead to desired levels of reading comprehension. 

Considering the fact that the vocabulary size is not the only factor affecting the 
reading comprehension, further studies might be carried out by integrating some 
other factors and determinants of reading comprehension and achievement including 
gender, age, reading goals, topic familiarity proficiency levels (see Horiba & Fukaya, 
2015), motivation (see Domyei, 1994) and the use of reading strategies (see Akyel & 
Ercetin, 2009; Grabe, 2009; Yayli, 2010), For instance, some vocabulary strategies 
such as top-down and bottom-up strategies, using linguistic clues (Kirmizi, 2014), 
and memory strategies (Tilfarlioglu & Bozgeyik, 2012) seemed to be correlated with 
academic success in Turkish context. In sum, reading comprehension of EFL learners 
might be delineated by integrating different angles in a holistic way in different 
language settings. 


References 

Adamson, H. D. (1993). Academic competence, theory and classroom practice: Preparing ESL 
students for content courses. New York, NY: Longman. 

Akyel, A., & Ercetin, G. (2009). Hypermedia reading strategies employed by advanced learners of 
English. System, 37(1), 136-152. 

Alderson, J. C. (2000). Assessing reading. Cambridge, UK: Cambridge. 

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. 
Cambridge, UK: Cambridge. 

Anderson, N. (2009). ACTIVE reading: The research base for a pedagogical approach in the 
reading classroom. In Z. H. Han & N. Anderson (Eds.), Second language reading: Research and 
instruction (pp. 117-143). Ann Arbor, MI: University of Michigan Press. 

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing 
useful language tests. Oxford, UK: Oxford University Press. 


1185 



EDUCATIONAL SCIENCES: THEORY & PRACTICE 


Bernhardt, E. B. (2011). Understanding advanced second-language reading. New York, NY: 
Routledge. 

Bowey, J. A. (1995). Socioeconomic status differences in preschool phonological sensitivity and 
first-grade reading achievement. Journal of Educational Psychology, 87, 476^187. 

Brown, H. D. (2003). Language assessment: Principles and classroom practices. New York, NY: 
Pearson. 

Caravolas, M., Hulme, C., & Snowling, M. J. (2001). The foundations of spelling ability: Evidence 
from a 3-year longitudinal study. Journal of Memory and Language, 45, 751-774. 

Carrell, P. L., & Grabe, W. (2002). Reading. In N. Schmitt (Ed.), An introduction to applied 
linguistics (pp. 233-250). Great Britain, UK: Arnold. 

Cohen, J. W. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New Jersey, 
NJ: Lawrence Erlbaum Associates. 

Collier, V. P. (1989). How long? A synthesis of research on academic achievement in a second 
language. TESOL Quarterly, 23(3), 509-531. 

DeKeyser, R. (2007). Conclusion: The future of practice. In R. DeKeyser (Ed.), Practice in a 
second language (pp. 287-304). New York, NY: Cambridge University Press. 

Diakidoy, I. A. N., & Anderson, R. C. (1991). The role of contextual information in word 
meaning acquisition during normal reading (Center for the Study of Reading Technical 
Report, 531). Retrieved from https://www.ideals.illinois.edu/bitstream/handle/2142/17820/ 
ctrstreadtechrepvO 1991 i00536_opt.pdf?sequence=l 

Dornyei, Z. (1994). Motivation and motivating in the foreign language classroom. The Modern 
Language Journal, 78(3), 273-284. 

Droop, M., & Verhoeven, L. (2003). Language proficiency and reading ability in the first and 
second language learners. Reading Research Quarterly, 35(1), 78-103. 

Durrant, P. (2013). Formulaicity in an agglutinating language: The case of Turkish. Corpus 
Linguistics and Linguistic Theory, 9(1), 1-38. 

Educational Testing Service. (2012). Test and score data summary for TOEFL IBT tests and TOEFL 
PBT tests. Retrieved from http://www.ets.Org/s/toefl/pdf/94227_unlweb.pdf 

Eskey, D. E. (2005). Reading in a second language. In E. Hinkel (Ed.), Handbook of research in 
second language teaching and learning (pp. 563-580). New Jersey, NJ: Lawrence Erlbaum. 

Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London, UK: Sage. 

Figueroa, R. A., & Hernandez, S. (2000). A report to the nation: Testing Hispanic students in 
the United States. For our nation on the fault line: Hispanic American education. Presidents 
Advisory Commission on Educational Excellence for Hispanic Americans. 

Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in 
education (8th ed.). New York, NY: McGraw-Hill. 

Frantzen, D. (2003). Factors affecting how second language Spanish students derive meaning from 
context. The Modern Language Journal, 87(2), 168-199. 

Grabe, W. (2009). Reading in a second language: Moving from theory to practice. New York, NY: 
Cambridge University Press. 

Grabe, W., & Stoller, F. L. (2002). Teaching and researching reading. London, UK: Pearson 
Education Longman. 

Graesser, A. C. (2007). An introduction to strategic reading comprehension. In D. S. McNamara 
(Ed.), Reading comprehension strategies: Theories, interventions and technologies (pp. 3-26). 
New York, NY: Lawrence Erlbaum Associates. 


1186 



Giingor, Yayli / The Interplay between Text-based Vocabulary Size and Reading Comprehension... 


Haastrup, K. (1991). Lexical inferencing procedures, or, talking about words: Receptive procedures 
in foreign language learning with special reference to English (Vol. 14). Tubingen, Germany: 
Gunter Narr Verlag Tubingen. 

Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). New Jersey, 
NJ: Lawrence Erlbaum Associates. 

Horiba, Y., & Fukaya, K. (2015). Reading and learning from L2 text: Effects of reading goal, topic 
familiarity, and language proficiency. Reading in a Foreign Language, 27(1), 22-46. 

Hu, M., & Nation, I. S. R (2000). Unknown vocabulary density and reading comprehension. 
Reading in a Foreign Language, 23, 403-430. 

International English Language Testing System. (2011). Test taker performance 2011. Retrieved from 
http://www.ielts.Org/researchers/analysis_ofJ:est_data/testJaker_performance_ 011 .aspx 

Kennedy, G. (1998). An introduction to corpus linguistics. London, UK: Longman. 

Kirmizi, O. (2014). Measuring vocabulary learning strategy use of Turkish EFL learners in relation 
to academic success and vocabulary size. World Journal of Education, 4(6), 16-25. 

Lantolf, J. R, & Thorne, S. L. (2007). Sociocultural theory and second language learning. In B. 
VanPatten & J. Williams (Eds.), Theories in second language acquisition: An introduction (pp. 
197-221). Mahwah, NJ: Lawrence Earlbaum. 

Lara, J., & August, D. (1996). Systemic reform and limited English proficient students. Washington, 
DC: Council of Chief State School Officers. 

Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? In C. Lauren 
& M. Nordman (Eds.), Special language: From humans to thinking machines (pp. 316-323). 
Clevedon, UK: England: Multilingual Matters. 

Laufer, B. (1992). How much lexis is necessary for reading comprehension? In H. Bejoint & P. 
Arnaud (Eds.), Special language: From humans thinking to thinking machines (pp. 316-323). 
Clevedon, UK: Multilingual Matters. 

Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, 
learners ’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22( 1), 15-30. 

Laufer, B., & Sim, D. D. (1985). An attempt to measure the threshold of competence for reading 
comprehension. Foreign Language Annals, 18(5), 405—411. 

Levine, A., & Reves, T. (1998). Interplay between reading tasks, reader variables and unknown word 
processing. In E. Alcon & V. Codina (Eds.), Current issues in English language methodology (pp. 
119-132). Castello de la Plana, Spain: Publicacions de la Universitat Jaume. 

Lewin, A., Fine, J., & Young, L. (2001). Expository discourse: A genre-based approach to social 
science research texts. New York, NY: Beverly. 

Linan-Thompson, S., & Vaughn, S. (2007). Research-based methods of reading instruction for 
English language learners, grades K-4. Alexandria, VA: ASCD. 

Meara, M. P. (1992). EFL vocabulary tests (2nd ed.). Swansea, UK: Lognostics. 

Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge, UK: Cambridge 
University Press. 

Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? Canadian 
Modern Language Review, 63( 1), 59-82. 

Nation, I. S. P. (2009). Teaching ESL/EFL reading and writing. New York, NY: Routledge. 

Nation, K., & Angell, P. (2006). Learning to read and learning to comprehend. London, UK: Review 
of Education, 4(1), 77-87. 


1187 



EDUCATIONAL SCIENCES: THEORY & PRACTICE 


O’keeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and 
language teaching. Cambridge, UK: Cambridge University Press. 

Perfetti, C. A., Landi, N., & Oakhill, J. (2005). The acquisition of reading comprehension skill. In 
M. J. Snowling & C. Hulme (Eds.), The science of reading: A handbook (pp. 227-247). Oxford, 
UK: Blackwell Publishing. 

Pike, L. (1979). An evaluation of alternative item formats for testing English as a second language. 
TOEFL (Research Reports No. 2). Princeton, NJ: Educational Testing Service. 

Qian, D. D. (2002). Investigating the relationship between vocabulary knowledge and academic reading 
performance: An assessment perspective. Language Learning, 52(3), 513-536. 

Rumelhart, D. E. (1977). Toward an interactive model of reading. In S. Dornic (Ed.), Attention and 
performance VI (pp. 573-603). Hillsdale, NJ: Lawrence Erlbaum Associates. 

Rumelhart, D. E. (1980). Schemata: The building blocks of cognition. In R. J. Spiro, B. C. Bruce, 
& W. E. Brewer (Eds.), Theoretical issues in reading comprehension (pp. 33-58). New Jersey, 
NJ: Lawrence Erlbaum Associates. 

Schmitt, N. (2000). Vocabulary in language teaching. New York, NY: Cambridge University Press. 

Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. Basingstoke, UK: 
Palgrave Macmillan. 

Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading 
comprehension. The Modern Language Journal, 95, 26-43. 

Schoonen, R., Hulstijn, J., & Bossers, B. (1998). Metacognitive and language-specific knowledge in 
native and foreign language reading comprehension: An empirical study among Dutch students 
in grades 6, 8, and 10. Language Learning, 48, 71-106. 

Stahl, S. A. (2003). Vocabulary and readability: How knowing word meanings affects comprehension. 
Topics in Language Disorders, 23(3), 241-247. 

Stanovich, K. (1986). Matthew effects in reading: Some consequences of individual differences in 
the acquisition of literacy. Reading Research Quarterly, 21, 360-407. 

Stanovich, K. E. (1980). Toward an interactive compensatory model of individual differences in the 
development of reading fluency. Reading Research Quarterly, 16, 32-71. 

Stanovich, K. E. (2000). Progress in understanding reading: Scientific foundations and new 
frontiers. New York, NY: Guilford Press. 

Thorndike, R. (1973). Reading comprehension education in 15 countries. Stockholm, Sweden: 
Almquist & Wiksell. 

Tilfarlioglu, F. Y., & Bozgeyik, Y. (2012). The Relationship between vocabulary learning strategies 
and vocabulary proficiency of English language learners. International Journal of Applied 
Linguistics and English Literature, 1(2), 91-100. 

Unaldi, I., Bardakci, M., Akpinar, K. D., & Dolas, F. (2013). A comparison of contextualized, 
decontextualized and corpus-informed vocabulary instruction: A quasi-experimental 
study. Journal of Language and Literature Education, 2(8), 78-95. 

Urquhart, S., & Weir, C. (1998). Reading in a second language: Process, product and practice. 
New York, NY: Longman. 

Vaughan, E. (2010). How can teachers use a corpus for their own research? hi A. O’Keeffe &M. McCarthy 
(Eds.), The Routledge handbook of corpus linguistics (pp. 471—484). London, UK: Routledge. 

Walter, H. C. (2003). Reading in second language. Subject Centre for Languages, Linguistics and Area 
Studies Good Practice Guide. Retrieved from http://www.llas.ac.uk/resources/gpg/1420 

Yayli, D. (2010). A think-aloud study: Cognitive and metacognitive reading strategies of ELT 
department students. Eurasian Journal of Educational Research, 38, 234-251. 


1188 



